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SUMMARY We propose use of QR factorization with sort 
and Dijkstra's algorithm for decreasing the computational com- 
plexity of the sphere decoder that is used for ML detection of 
signals on the multi-antenna fading channel. QR factorization 
with sort decreases the complexity of searching part of the de- 
coder with small increase in the complexity required for prepro- 
cessing part of the decoder. Dijkstra's algorithm decreases the 
complexity of searching part of the decoder with increase in the 
storage complexity. The computer simulation demonstrates that 
the complexity of the decoder is reduced by the proposed meth- 
ods significantly. 

key words: MIMO fading channel , maximum likelihood detec- 
tion, sphere decoder, lattice 

1. Introduction 

In the multi-antenna mobile communication, it is well- 
known that use of multiple transmit and receive an- 
tennas linearly increases the channel capacity of a fre- 
quency nonselective fading channel with the channel 
state information (CSI) known at the receiver [1],[2]. 
In the case of the uncoded multi-antenna systems, 
the computational complexity of the naive maximum- 
likelihood (ML) decoding algorithm grows exponen- 
tially with the number of transmit antennas, so we need 
an efficient algorithm to implement ML decoding. On 
the multi-antenna fading channel, if the receiver has 
CSI, the receiver can compute the set of ideal received 
signal points considering only influence of the fading 
and disregarding influence of the additive noise. So 
when the noise at each receive antenna is the additive 
white Gaussian, to implement ML decoding we search 
for the ideal received signal point closest to the actual 
received signal point. By regarding the ideal received 
points as lattice points, the ML decoding problem is re- 
duced to the classical closest lattice point search prob- 
lem. Fincke and Pohst proposed an efficient algorithm 
for that problem [3] , and recently it was applied to the 
decoding problem and called sphere decoder (SD) [4]. 

SD can be divided into the two parts. The first 
part computes QR factorization (or Cholesky factoriza- 
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tion) of the fading matrix. The second part computes 
the ML estimate of transmitted signal from the received 
signal and QR factorization. We call the first part pre- 
processing part and the second part searching part. In 
this paper we propose QR factorization with sort and 
use of Dijkstra's algorithm for decreasing the compu- 
tational complexity of SD. QR factorization with sort 
gives an efficient order of decisions on signal compo- 
nents. It reduces the complexity of searching part with 
increase in the complexity of preprocessing part. Dijk- 
stra's algorithm is an efficient algorithm used to solve 
the shortest path problem in the graph. We apply this 
algorithm to searching part. It reduces the complexity 
of searching part with increase in the storage complex- 
ity. 

The QR factorization with sort modifies only pre- 
processing part and use of Dijkstra's algorithm modifies 
only searching part. Thus these improvements are in- 
dependent and can be used together or alone. 

This paper is organized as follows: Section 2 intro- 
duces the channel model of the multi-antenna fading 
channel and shows how the original SD works. Sec- 
tion 3 introduces QR factorization with sort and Sec- 
tion 4 proposes application of Dijkstra's algorithm to 
SD. Section 5 shows the comparison between the com- 
plexity of the original SD and that of SD using the 
proposed methods by the computer simulations. These 
simulations show that the proposed methods decrease 
the complexity of a decoder significantly. 

2. Original sphere decoder 

2.1 Channel model 

Suppose that we have the uncoded system with t trans- 
mit antennas and r receive antennas, that the noise at 
each receive antenna is the additive white Gaussian, 
and that the receiver has CSI. At the transmitter, in- 
formation sources are demultiplexed into t substreams, 
and transmitted by transmit antennas. Let a be a (t x 1) 
vector consisting of complex envelopes of transmitted 
signals with the signal constellation S, M the (r x t) 
fading matrix whose (k,j) entry is a complex fading co- 
efficient between j-th transmit antenna and fc-th receive 
antenna, v a (r x 1) complex vector whose component 
is noise at each receive antenna, and x a (r x 1) complex 
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vector whose component is the received signal compo- 
nent at each receive antenna. The model of this channel 
is written as 



x = Ma + u (a€S 4 ). 



(1) 



On the channel described by Eq. (1), when the compo- 
nents of v are independent complex Gaussian random 
variables, the ML decoding problem can be reduced to 
the closest lattice point search problem for the set of 
lattice points {Ma a 6 S*} and a received signal point 
x. See [4], [5] for details. 

We also remark that when we use linear space- 
time coding, the ML decoding problem is reduced to 
the closest lattice point search problem by describing 
the channel as Eq. (1) [6]. 

2.2 Algorithm 

In this section, we show how the original SD works 
when t < r. Fincke and Pohst's original method treats 
real numbers, and we can treat complex numbers in the 
almost same way [5]. 

To implement ML decoding on the channel de- 
scribed by Eq. (1), we must compute the ML transmit 
signal a = (&i, • ■ • , d t ) T equal to 



argmm 
aes* 



Mk\\. 



(2) 



To compute Eq. (2), SD considers a sphere with center 
at the received signal in the complex Euclidean space. 
If there are lattice points in the sphere, the closest point 
is in the sphere. SD takes a suitable value as the radius 
of sphere, and searches for lattice points in the sphere. 

First we compute QR factorization of M and ob- 
tain an upper triangular matrix R and a unitary matrix 
Q with M = QR. Since Q is a unitary matrix, 

||x-Afa|| 2 = ||Q*x-Q*Ma|| 2 = ||Q*x - i?a|| 2 .(3) 

Let p = Q*x = (pi, • • • , p r ) T , C the square of suitable 
radius and the element of R. The lattice points 
Ma that satisfy 



||x-Ma|| 2 = P2a-p|| 
t 

= E 

i=l 




E ti<c 



(4) 



i=t+l 



are in the sphere. Satisfying Eq. (4) is equivalent to 
satisfying the following inequalities for all k = 1, • • • , t 



j=k+l 



<c- 



E 

i=k+l 



a - \ Pi- ^2 



j=i+l 



(5) 



where C = C — Y^i=t+i Pi- SD computes a satisfying 
Eq. (4) by deciding di in order of i = t, ■ ■ ■ , 1 from Eq. 
(5). 

To simplify Eq. (5), we define S k , and D k as 



Sk = [ Pi - ^2 rkj&j j /r kk 
j=k+i 



E 

i=k+l 



t 

Tii&i | Pi ^ ^ &j 
j=i+l 



(6) 



(7) 



Then Eq. (5) is written by S k and D k as 

\ai-Si\ 2 <(C -D^/lml 2 . (8) 

The candidates of a, satisfying Eq. (8) are 
in the circle wit h the center Si and the radius 
\J (C — Di)/\ru\ 2 on the complex plane. If there is 
no candidate of fij, SD goes back to decision on aj +1 . If 
there are some candidates, we must choose one of them. 
To reduce the complexity of searching part, a method 
starting with dj nearest to Si among the all candidates 
is proposed in [7]. When SD only treats real numbers, 
it is clear which di is nearest to Si. But when SD 
treats complex numbers, finding di nearest to Si needs 
to compute \di — Si\ 2 for all di G S and not necessarily 
reduces the complexity. So we employ another method. 
In Section 5, SD chooses a, in the increasing order of 
5ft(cti — Si) | and, if there are two or more candidates 
of di with the same value of |9£(aj — Sj)|, SD chooses 
di with a smaller |9f(oj — Sj)|, where 5ft(-) denotes the 
real part and Q(-) denotes the imaginary part. When 
5R(di - Si) 2 is larger than (C — Di)/\ru\ 2 , SD concludes 
that there is no di in the circle any more. 

When a satisfies all inequalities (8), SD concludes 
that Ma is a lattice point in the sphere. Then the new 
radius is set to ||Ma — x|| and SD repeats the same 
operations until there is no lattice point in the sphere, 
and the last point is the closest point. If there is no 
lattice point in the sphere with the radius given first, 
SD will declare the erasure of signal or increase the 
radius. 

3. QR factorization with sort 

3.1 Changing the order of decisions on d, t 

In the previous section, we obtained the inequality with 
each signal component di. In searching part, the com- 
putational complexity largely depends on the order of 
decisions on di. In this section, we consider an efficient 
order of decisions on di. 
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For a permutation a and M = (vi, • • • , v t ), Eq. (1) 
is equivalently described by P = (v^i), • • • ,v a (t)) an d 



( a <x(l): • ' ' j a <j(t)) J 
X = Pp + V. 



as 



(9) 



When SD processes the channel described as Eq. (9), 
the order of decisions on a, follows the order of com- 
ponents of p. So we can change the order of decisions 
on hi arbitrarily by a. SD can obtain p that is the ML 
estimate of p, and one can get the ML estimate of the 
original channel (1) from p by the inverse permutation 
a-\ 

Next we consider the efficient order of decisions on 
ai. The number of candidates of en satisfying Eq. (8) 
is proportional to 



(10) 



Intuitively we can reduce the complexity of searching 
part by changing the order of decisions on &i so that 
the value of Eq. (10) is small for large i, because a, 
are decided in order of i = t, ■ ■ ■ , 1. Because the value 
of Eq. (10) is inversely proportional to \ru\ 2 , we can 
reduce the complexity by constructing the matrix R so 
that \m\ takes the large value for large i. 

Now we propose QR factorization with sort to com- 
pute the efficient order of decisions on Sj. QR fac- 
torization computes ra in increasing order of i. QR 
factorization with sort permutes columns of the factor- 
ized matrix before each computation of m such that 
ru is minimized. QR factorization with sort is used 
for decreasing the error probability of the nulling and 
canceling decoder in [8]. In this paper, we use QR fac- 
torization with sort for decreasing the complexity of 
ML decoder without changing the error probability. 

In [9], it is claimed that the order maximizing 
mini<i< t |rjj| is optimal for reducing the computational 
complexity of searching part. But the computation of 
this order requires QR factorizations t 2 /2 times. In 
the mobile environment, the fading matrix M often 
changes. So the computational complexity of prepro- 
cessing part proposed in [9] is not negligible because 
preprocessing part is computed whenever fading ma- 
trix M changes. In [3], [9], it is also said that we can 
reduce the complexity of SD by reordering decisions on 
a,i according to the norm of corresponding basis vec- 
tors. In Section 5, we compare QR factorization with 
sort and other methods by computer simulations. 

3.2 Algorithm 

In this subsection, we show how QR factorization with 
sort works. QR factorization with sort gives a per- 
mutation realizing an efficient order of decisions on d, 
and QR factorization for the permuted matrix P in Eq. 
(9) . The following algorithm is almost the same as [8] . 
The method in [8] is based on Gram-Schmidt algorithm, 



and our method is based on Householder method. It is 
known that Householder method is numerically more 
stable than Gram-Schmidt algrithm[10]. 

The ordinary QR factorization of M can be 
sketched as follows: Compute a unitary matrix Q\ such 
that the first column of Q\M is (m, 0, • • • , 0) T . Let M 2 
be ((r — 1) x (t — 1)) submatrix of Q\M with the first 
column and the first row of Q\M removed. Compute a 
unitary matrix Q 2 such that the first column of Q 2 M 2 is 
(r 22 , 0, • • • , 0) T . The computation process is recursively 
repeated until i = t. See [10] for details. 

We will describe QR factorization with sort. Ob- 
serve that in the ordinary QR factorization m is equal 
to the norm of the first column vector of M. In order 
to minimize m, we replace the first column of M with 
the column with minimum norm. Let M' be the column 
replaced version of M. Compute a unitary matrix Q[ 
such that the first column of Q[H' is (rn,0, • • • ,0) T . 
Let M 2 be ((r - 1) x (t - 1)) submatrix of Q[M' with 
the first column and the first row of Q[M' removed. 
Replace the first column of M 2 with the column with 
minimum norm in M 2 . Let M 2 be the column replaced 
version of M 2 . Compute a unitary matrix Q' 2 such that 
the first column of Q 2 H 2 is (r 22 , 0, • • • , 0) T . The com- 
putation process is recursively repeated until i = t. 

With this process we get a QR factorization QR 
of the column permuted matrix P of M. If we apply 
searching part in Section 2 to QR, then we get more 
efficiently the ML estimate p. The ML estimate of a 
can be obtained by the inverse permutation. 

4. Dijkstra's algorithm 

In this section we apply Dijkstra's algorithm to search- 
ing part to reduce the complexity of searching part with 
increase in the storage complexity. Dijkstra's algorithm 
is an efficient algorithm to find the shortest path from a 
point to a destination in a weighted directed graph [11]. 
In this algorithm, the vertices on the graph are searched 
for in order of their distance from the departure. 

The decisions on fij essentially constructs a tree 
where nodes at fc-th level are correspond to the candi- 
dates of at-k+i [5], and the root is placed at the 0-th 
level. Set the weight of the branch from the node fij to 
its parent to 



j=i+i 



1,3^3 



S l \ 2 . (11) 



Then the distance of node a, from the root is equal to 
Di—i. The nodes having the same parent are arranged 
in the increasing order of the distance from left to right. 

If we use Dijkstra's algorithm to find the shortest 
path from the root to one of nodes at the bottom level, 
we can get the node with the minimum Dq = ||x— Ma|| 2 
among all nodes at the bottom level and it corresponds 
to the ML estimate. 
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We show Dijkstra's algorithm. 

1. Create an empty priority queue for nodes. The 
priority is the distance from the root. 

2. Insert the leftmost node at the first level into the 
priority queue. 

3. Select the node A having smallest distance in the 
priority queue and remove it from the priority 
queue. If the level of A is t, finish this algorithm. 

4. Insert the leftmost A's child node into the priority 
queue. 

5. Insert the right neighboring node of A into the pri- 
ority queue. 

6. Go back to Step 3 

Because the node selected in Step 3 has the smaller 
distance than the nodes selected later, the node at the 
bottom level selected first has the minimum value of 
D among all nodes at the bottom level. 

In the sequel, we refer to SD not using Dijkstra's 
algorithm as SD, and SD using Dijkstra's algorithm as 
Dijkstra's algorithm. 

Figure 1 shows an example of the order of search 
by Dijkstra's algorithm and SD. The values in circles 
show the distance from the root. The numbers in upper 
rectangles show the order by Dijkstra's algorithm and 
the numbers in lower rectangles show the order by SD. 
SD is the depth first search algorithm for a tree. In this 
case, the number of searched nodes is 5 by Dijkstra's 
algorithm and is 8 by SD. 

Dijkstra's algorithm searches for only the nodes 
whose distance is smaller than the minimum distance 
of nodes at the bottom level, but SD searches for the 
node whose distance is smaller than C and C must 
be greater than the minimum distance of nodes at the 
bottom level in order for ML detection succeed. So 
the number of searched nodes of Dijkstra's algorithm is 
smaller than that of SD. However because we use the 
priority queue in Dijkstra's algorithm, the storage com- 
plexity increases. In addition, Dikstra's algorithm does 
not require the radius of the sphere to be initially set, 
and it always finds out ML estimate without retrying 
to search for a lattice point with increased radius. 




Fig. 1 The order of search by Dijkstra's algorithm and the orig- 
inal SD 



Arranging the nodes having the same parent ac- 
cording to the distance needs to compute |dj — Si\ 2 



of nodes. Instead of doing this, our algorithm consid- 
ers the candidates of $l(at-k+i) an d the candidates of 
9(a t _fc+i) separately in Section 5. Then the level of 
tree is equal to 2t excluding the root, and arranging 
the nodes having the same parent only needs to com- 
pute |R(oi) - and |9f(oj) - 3(^)1 

5. Computer simulation 

In this section, we show how much the complexity of 
searching part is reduced by QR factorization with sort 
and Dijkstra's algorithm, the complexity of preprocess- 
ing part is increased by QR factorization with sort, and 
the storage complexity is increased by Dijkstra's algo- 
rithm over an uncoded multi-antenna fading channel. 
The radius of sphere used by SD is defined so that 

Prjtransmit point is in sphere} = Pr{C > \v\ z } 

w 0.99 (12) 

where C is the square of radius and v is a vector whose 
element is noise at each receive antenna [5] . When there 
is no lattice point in sphere, we increase the radius to 
C + 1, and continue until a lattice point is found. 

5.1 The system model 

We consider the following system model. 

• The number of transmit antennas is equal to the 
number of receive antennas. 

• The fading coefficients obey the CAf(Q, 1) distribu- 
tion. 

• The signal constellation for each transmit antenna 
is 64-QAM and all signals are drawn according to 
the uniform i.i.d. distribution. 

5.2 The computer simulations 

In this subsection we show comparisons of complexities 
of the proposed methods and other variants of SD. We 
remark that all methods in this subsection are ML de- 
coding and hence the error rates of these ML decoding 
methods are the same. First we show the compari- 
son of the complexities of SD not reordering decisions 
on dj (SD), SD reordering decisions on dj according to 
norms of basis vectors (Norm-SD) [3], [9], SD reorder- 
ing decisions on d, so that mhii<i< t \ru\ is maximized 
(Optimal-SD) [9] and SD with the QR factorization 
with sort (QR sort-SD). The value of SNR is set to 
26dB. In these simulations we use the average number 
of real multiplications and divisions for each processing 
as the measure of complexity, and in these simulations 
we use the complex multiplications that needs three 
real multiplications and seven real additions, and the 
complex divisions that needs five real multiplications, 
two real divisions, and nine real additions [12]. Figure 
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2 shows the complexity of searching part and Figure 

3 shows the complexity of preprocessing part. When 
the number of transmit antennas is 8 the complexity 
of searching part is reduced about 55 percent from the 
original SD by QR factorization with sort. However 
Figure 3 shows the complexity of preprocessing part 
increases about 10 percent. Figure 4 shows the total 
complexity of SD for 10 transmissions with the same 
fading matrix. In this case the complexity of SD is re- 
duced about 60 percent from the original SD by QR 
factorization with sort. 
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Next we show the comparison of the complexities 
of SD (SD), Dijkstra's algorithm (Dijkstra), and both 
of them using QR factorization with sort (QR sort-SD, 
QR sort+Dijkstra). 

The number of antennas is set to 8. Figure 5 
shows that the complexity of searching part and Figure 
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Fig. 4 The complexity of SD for 10 transmissions with the 
same fading matrix 



6 shows the cumulative distribution of the size of prior- 
ity queue with QR factorization with sort. When SNR 
is 26dB, the complexity of searching part is reduced 
about 25 percent from the original SD by Dijkstra's 
algorithm, and is reduced about 65 percent from the 
original SD by combining QR factorization with sort 
and Dijkstra's algorithm. Figure 5 also shows that Di- 
jkstra's algorithm is much faster than SD when SNR is 
low. 
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6. Conclusion 

We proposed the QR factorization with sort and use of 
Dijkstra's algorithm as methods for decreasing the com- 
putational complexity of the sphere decoder. QR fac- 
torization with sort reduces the complexity of searching 
part of a decoder with little increase in the complexity 
of preprocessing part of a decoder. Because the pre- 
processing part is computed once for each fading ma- 
trix and the increase in the complexity of preprocessing 
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Fig. 6 The cumulative distribution of the size of priority queue 

part is little enough, the total complexity of SD can be 
reduced. Dijkstra's algorithm reduces the complexity of 
searching part of a decoder with increase in the storage 
complexity. By these reductions of the complexity, the 
proposed methods enable us to implement ML decod- 
ing for the multi-antenna system with a lager number 
of transmit antennas. 
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